The RESP AI model accelerates the identification of tight-binding antibodies

High-affinity antibodies are often identified through directed evolution, which may require many iterations of mutagenesis and selection to find an optimal candidate. Deep learning techniques hold the potential to accelerate this process but the existing methods cannot provide the confidence interval or uncertainty needed to assess the reliability of the predictions. Here we present a pipeline called RESP for efficient identification of high affinity antibodies. We develop a learned representation trained on over 3 million human B-cell receptor sequences to encode antibody sequences. We then develop a variational Bayesian neural network to perform ordinal regression on a set of the directed evolution sequences binned by off-rate and quantify their likelihood to be tight binders against an antigen. Importantly, this model can assess sequences not present in the directed evolution library and thus greatly expand the search space to uncover the best sequences for experimental evaluation. We demonstrate the power of this pipeline by achieving a 17-fold improvement in the KD of the PD-L1 antibody Atezolizumab and this success illustrates the potential of RESP in facilitating general antibody development.

demonstrated that all of these sequences bind with K D < 20 nM, and all of them are indeed predicted to be binders by our model. c) The model's uncertainty regarding its assigned score for predictions on the test set. Incorrect predictions have higher associated uncertainty than correct predictions, indicating that as anticipated, model assigned uncertainty can assist in determining whether a prediction should be considered reliable. There are 3,158 unique sequences in this test set. Significance was assessed using a two-sided Mann-Whitney U test; the resulting p-value was 1e-23.
The following conventions apply for each boxplot. The upper and lower bounds of the box are the 25th and 75th percentile of the data, and the whiskers are drawn at 1.5x the interquartile range (the distance from the 25th percentile to the 75th percentile). The center is drawn at the median of the data, and the "notch" represents the 95% confidence interval on the median (as determined by nonparametric bootstrap). The diamonds represent "flier" points which lie outside 1.5x the interquartile range. Four asterisks indicates the p-value is < 0.0001 . Source data is provided as a source data file. Figure 11: Clustering of simulated annealing results. The dendrogram for clustering of sequences harvested from the simulated annealing procedure after sequences with scores associated with wide confidence intervals had been removed. The dendrogram suggests the presence of at least two main regions of sequence space identified by the modified simulated annealing algorithm. Source data is provided as a source data file.   Gene gac ata caa atg act caa agc ccg agt tcc cta tct gcg tct gtt ggg gac cgt gtt acg att acg tgt aga gca tca caa gac gtt tca act gcg gta gca tgg tac caa caa aag ccc ggt aag gca ccc aag cta ctg atc tat agc gca tct ttt ctt tac agt ggt gtt ccg agc agg ttc agt gga tca ggg tca ggc act gat ttc acc ttg acc atc tcc agt ctg caa ccg gaa gat ttc gca act tat tat tgc caa caa tat ttg tat cac cca gct acg ttc ggt caa ggt aca aag gta gaa atc aaa cgt ggg gga ggt gga tct ggg ggc ggt ggg tcc gga gga gga gga tca ggc ggt ggc gga tct gaa gtc cag ctt gta gag agt ggg ggc gga ttg gtc cag ccc ggt ggg tca ttg cgt tta tca tgt gca gcc agt ggg ttc acc ttt agc gac agt tgg atc cat tgg gta aga cag gct cca ggc aag ggc tta gaa tgg gta gca tgg ata agc ccc tac ggg gga tct act tat tac gct gat agt gta aag gga cgt ttc aca gca tca gct gat acg tct aag aac acc act tac cta caa atg aac tcc ctt aga gcc gaa gat aca gcc gtt tac tac tgt gtt cgt cgt cat tgg ccc gga ggg ttt gat tat tgg ggg caa ggc aca ttg gta aca gtg agt agt

S2.1 Sequence Quality
Sequence read pairs that contained one or more base pairs with a phred quality score < 10 were discarded since the sequence read in this case may be unreliable. In the event the overlapping region of the paired ends did not match, both reads were discarded, so that no mismatches between the paired ends were allowed. The reads that met these quality criteria were merged and translated to yield mutant atezolizumab sequences of 118 amino acids in length. Each sequence that occurred more than once in any given category was assigned a frequency for that category indicating the number of times it was found. Sequences with mutations in the first 30 or last 8 positions were excluded from further consideration, since these positions were not targeted for mutation and mutations at these positions were very rare so it was difficult to assess their importance.

S2.2 Generation of the WT and Mutant Atezolizumab scFv Library
To test the function of the WT scFv in the yeast display format, the gene for WT Atezolizumab scFv 1 was purchased as a geneblock (using codons optimized for yeast) from Integrated DNA Technologies and cloned into a modified pYD1(Addgene #73447) yeast display vector by PCR amplification of the geneblock with AtezF/AtezR primers (Supplemental Table S2) and then using double digestion/ligation into a digested vector by Golden Gate Assembly 2 3 . (Esp3I (Thermo Scientific) and T4 DNA Ligase (New England Biolabs (NEB))) and sequence verified before transformation into EBY100 yeasts via 4 . For the Atezolizumab library, the WT scFv in the pYD1 vector was used as template for either high fidelity PCR of the light chain region with Q5 hotstart DNA polymerase (NEB) (primers IF1F/IF1R) or error-prone PCR of the heavy chain region with Taq DNA polymerase (Invitrogen 18038-018, IF2F/IF2R) using 8-oxo-dGTP (TriLink N-2034-1) and dPTP (TriLink N-2037-1 ) 5 , except 30 cycles of error-prone PCR with 200 µM each dPTP and 8-oxo-dGTP (final concentration 20 µM each) were used to increase the mutation rate. The PCR products of the light/heavy chains were concentrated by the DNA Clean and Concentrator-5 kit (DCC-5, Zymo Research) and purified by an agarose gel extraction kit (Zymo Research gel extraction kit). The 2 DNA fragments (WT light chain, mutated heavy chain) were assembled into a single DNA molecule by overlap extension PCR using the OF/OR primers (all primers in Table S2) with Q5 hotstart DNA polymerase, the resulting product was again concentrated by DCC-5 kit and purified by gel extraction. TA cloning (Invitrogen TA Cloning Kit, K202020) by adding 3'A to the product with Taq was used to examine the mutation rate in the heavy chain, random sequencing of clones revealed 0-8 mutations per gene. In preparation for yeast electroporation, the pYD1 vector was double digested with XhoI and EcoRI-HF, concentrated by DCC-5 kit, and gel extracted/eluted into ddH20.
The scFv library was transformed into yeast by electroporation (3 μg digested pYD1, 9 μg scFv gene per electroporation) and assembled in the EBY100 cells (ATCC MYA-4941) by homologous recombination 6 , giving 7.8 x 10 7 transformants as determined by serial dilution onto plates with selective growth media (16.7 g/L BD bactoagar for solid media (see 7 for more on media recipe)). The library was passaged several times in selective growth media to ensure 1 plasmid/cell before being frozen in aliquots at -80°C in 85% ddH20, 10% glycerol, 5% DMSO (3.6 x 10 8 cells/tube). As a control for sorting, the WT scFv gene/pYD1 was also electroporated into EBY100 so as to be in the exact same position in the vector as the library.

S2.3 Atezolizumab scFv Library Screening by Yeast Surface Display
First, in order to determine the optimal competition time, the WT scFv koff (off-rate) towards PD-L1 was determined basically as previously described 8 Figure S12 for example FACS plot. Sort gates were set to collect mutants with faster, WT-level, or slower off-rates to PD-L1, see Figure S12 for example of gating. Hits were collected in selective growth media and grown up to high density at 30°C and aliquoted to frozen stocks at -80°C. The process was repeated twice for binders with faster or WT-level off-rates and four times for the mutants with slower off-rates.

S2.4 Generation and Screening of the Focused 21-Mutant Library
Twenty-one geneblocks coding the 21 heavy chain mutants were purchased from IDT, followed by PCR amplification and overlap extension PCR (Q5 Hotstart DNA Polymerase) to generate 21 full-length Atezolizumab scFv mutant genes. The sequences of the 21 heavy chain mutants appear in Table S3 of the Supplementary Info. The genes were electroporated (along with linearized pYD1) into EBY100 yeasts to assemble the focused library and the library was passaged/frozen as described for the initial library. Titration onto selective media plates revealed >10 7 transformants.
In order to screen the library for the most improved (slowest off-rate) mutants, increasingly long competition times were used in the presence of excess unlabelled PD-L1 (after initially saturating the WT or mutant library with 200 nM biotin-PD-L1, the cells were pelleted/washed to remove unbound biotin-PD-L1 and resuspended in excess (128 nM unlabelled PD-L1) for 7.5h, 17h, then 39h at RT). The most intense clones were collected by FACS (using the WT scFv as a reference) in growth media and propagated to the next round of sorting. After 3 rounds of selection, plasmids from the hits were harvested by yeast miniprep and transformed into GC10 competent cells, which were picked as individual colonies and subject to bacterial miniprep. Plasmids were sequenced using Eton Bioscience sequencing services.

S2.5 Determination of the WT/Mutant 4/Durvalumab/Avelumab K D and k off Values on the Surface of Yeast
The RT K off (off-rate) of the scFv-PD-L1 complex was determined essentially as described 8 using biotin/non-biotin PD-L1 (same versions used during library screening), with TBS-BSA as the buffer and the same antibody reagents used for library screening (also see section S2.3). The yeasts were grown/induced in the same manner as with the screening protocol and labeled with 200 nM b-PD-L1 at 4 x 10 7 cells/mL, pelleted/washed, followed by 64 nM unlabelled PD-L1 (at 10 7 /mL) and rotated at RT. The mean fluorescence intensity (MFI) of the V5 positive yeasts was recorded at each time point and the data fit to a one phase decay model with GraphPad Prism 9.3.0 software (Y = (Y0 -Plateau)*exp(-K*X) + Plateau, Y is fraction of yeast bound to biotin-PDL1, X is time, Y0 = 1 (fraction of yeast bound to biotin-PDL1 at the 0s competition time point), plateau is a constant based on nonspecifically bound yeast, and K is Koff). The K D on the yeast surface was determined as previously described 5 , where the MFI of the V5+ population was plotted vs. the antigen concentration and the biotin-PD-L1 the same as used for the library screening. Important notes on the K D determination include using 10 5 cells per data point, using 1:200 anti-V5 antibody after resuspension of each data point in 100 uL TBS-BSA (cold) for 1hr, washing cells with cold TBS-BSA after V5 mAb incubation, then labeling cells with 1:100 SA-PE/IgG-AF647 for 20 minutes on ice, and washing cells in cold TBS-BSA and resuspension in cold TBS-BSA for flow cytometry. The data was fit to the following equation: y = B max *X/(K D +X) (Bmax was maximum MFI value, X the antigen concentration, the MFI without PD-L1 was subtracted from all values). Due to the very slow off-rate of Mutant 4, it was necessary to incubate the samples at RT for 6 days to allow the system to get to equilibrium.

S2.6 Cloning and Purification of WT/Mutant 4 Atezolizumab scFv
The genes encoding the scFv sequences were amplified by PCR out of the pYD1 vector with primers containing NcoI-HF and XhoI cut sites (scFv F/R, Table S2) using Q5 hotstart DNA polymerase (NEB). The PCR products were double digested (along with the pET27b(+) vector (69863-3, MilliporeSigma) (the vector was also dephosphorylated with Quick CIP (NEB)). The products of the digestion were ligated using T4 DNA Ligase (NEB) and the ligation transformed into NEB 5-alpha cells. The plasmids were purified using the Zippy Plasmid Miniprep Kit and sequence verified.